13 research outputs found

    Multilingual Neural Machine Translation System for Indic to Indic Languages

    Full text link
    This paper gives an Indic-to-Indic (IL-IL) MNMT baseline model for 11 ILs implemented on the Samanantar corpus and analyzed on the Flores-200 corpus. All the models are evaluated using the BLEU score. In addition, the languages are classified under three groups namely East Indo- Aryan (EI), Dravidian (DR), and West Indo-Aryan (WI). The effect of language relatedness on MNMT model efficiency is studied. Owing to the presence of large corpora from English (EN) to ILs, MNMT IL-IL models using EN as a pivot are also built and examined. To achieve this, English- Indic (EN-IL) models are also developed, with and without the usage of related languages. Results reveal that using related languages is beneficial for the WI group only, while it is detrimental for the EI group and shows an inconclusive effect on the DR group, but it is useful for EN-IL models. Thus, related language groups are used to develop pivot MNMT models. Furthermore, the IL corpora are transliterated from the corresponding scripts to a modified ITRANS script, and the best MNMT models from the previous approaches are built on the transliterated corpus. It is observed that the usage of pivot models greatly improves MNMT baselines with AS-TA achieving the minimum BLEU score and PA-HI achieving the maximum score. Among languages, AS, ML, and TA achieve the lowest BLEU score, whereas HI, PA, and GU perform the best. Transliteration also helps the models with few exceptions. The best increment of scores is observed in ML, TA, and BN and the worst average increment is observed in KN, HI, and PA, across all languages. The best model obtained is the PA-HI language pair trained on PAWI transliterated corpus which gives 24.29 BLEU.Comment: 38 pages, 2 figure

    Optimization Matrix Factorization Recommendation Algorithm Based on Rating Centrality

    Full text link
    Matrix factorization (MF) is extensively used to mine the user preference from explicit ratings in recommender systems. However, the reliability of explicit ratings is not always consistent, because many factors may affect the user's final evaluation on an item, including commercial advertising and a friend's recommendation. Therefore, mining the reliable ratings of user is critical to further improve the performance of the recommender system. In this work, we analyze the deviation degree of each rating in overall rating distribution of user and item, and propose the notion of user-based rating centrality and item-based rating centrality, respectively. Moreover, based on the rating centrality, we measure the reliability of each user rating and provide an optimized matrix factorization recommendation algorithm. Experimental results on two popular recommendation datasets reveal that our method gets better performance compared with other matrix factorization recommendation algorithms, especially on sparse datasets

    A knowledge reuse framework for improving novelty and diversity in recommendations

    No full text

    Effective data summarization for hierarchical clustering in large datasets

    No full text

    P.: A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recognition 44(12

    No full text
    a b s t r a c t Clustering has been widely used in different fields of science, technology, social science, etc. Naturally, clusters are in arbitrary (non-convex) shapes in a dataset. One important class of clustering is distance based method. However, distance based clustering methods usually find clusters of convex shapes. Classical single-link is a distance based clustering method, which can find arbitrary shaped clusters. It scans dataset multiple times and has time requirement of Oðn 2 Þ, where n is the size of the dataset. This is potentially a severe problem for a large dataset. In this paper, we propose a distance based clustering method, l-SL to find arbitrary shaped clusters in a large dataset. In this method, first leaders clustering method is applied to a dataset to derive a set of leaders; subsequently single-link method (with distance stopping criteria) is applied to the leaders set to obtain final clustering. The l-SL method produces a flat clustering. It is considerably faster than the single-link method applied to dataset directly. Clustering result of the l-SL may deviate nominally from final clustering of the single-link method (distance stopping criteria) applied to dataset directly. To compensate deviation of the l-SL, an improvement method is also proposed. Experiments are conducted with standard real world and synthetic datasets. Experimental results show the effectiveness of the proposed clustering methods for large datasets

    A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data

    No full text
    Collaborative filtering (CF) is the most successful approach for personalized product or service recommendations. Neighborhood based collaborative filtering is an important class of CF, which is simple, intuitive and efficient product recommender system widely used in commercial domain. Typically, neighborhood-based CF uses a similarity measure for finding similar users to an active user or similar products on which she rated. Traditional similarity measures utilize ratings of only co-rated items while computing similarity between a pair of users. Therefore, these measures are not suitable in a sparse data. In this paper, we propose a similarity measure for neighborhood based CF, which uses all ratings made by a pair of users. Proposed measure finds importance of each pair of rated items by exploiting Bhattacharyya similarity. To show effectiveness of the measure, we compared performances of neighborhood based CFs using state-of-the-art similarity measures with the proposed measured based CF. Recommendation results on a set of real data show that proposed measure based CF outperforms existing measures based CFs in various evaluation metrics

    Hidden location prediction using check-in patterns in location-based social networks

    No full text
    This is a post-peer-review, pre-copyedit version of an article published in Knowledge and Information Systems. The final authenticated version is available online at: https://doi.org/10.1007/s10115-018-1170-5Check-in facility in a Location Based Social Network (LBSN) enables people to share location information as well as real life activities. Analysing these historical series of check-ins to predict the future locations to be visited has been very popular in the research community. However, it has been found that people do not intend to share the privately visited locations and activities in a LBSN. Research into extrapolating unchecked locations from historical data is limited. Knowledge of hidden locations can have a wide range of benefits to society. It may help the investigating agencies in identifying possible places visited by a suspect, a marketing company in selecting potential customers for targeted marketing, for medical representatives in identifying areas for disease prevention and containment, etc. In this paper, we propose an Associative Location Prediction Model (ALPM), which infers privately visited unchecked locations from a published user trajectory. The proposed ALPM explores the association between a user's checked-in data, the Hidden Markov Model and proximal locations around a published check-in for predicting the unchecked or hidden locations. We evaluate ALPM on real-world Gowalla LBSN dataset for the users residing in Beijing, China. Experimental results show that the proposed model outperforms the existing state of the art work in literature
    corecore